New Quasi-Newton Optimization Methods for Machine Learning
نویسندگان
چکیده
This thesis develops new quasi-Newton optimization methods that exploit the wellstructured functional form of objective functions often encountered in machine learning, while still maintaining the solid foundation of the standard BFGS quasi-Newton method. In particular, our algorithms are tailored for two categories of machine learning problems: (1) regularized risk minimization problems with convex but nonsmooth objective functions and (2) stochastic convex optimization problems that involve learning from small subsamples (mini-batches) of a potentially very large set of data. We first extend the classical BFGS quasi-Newton method and its limited-memory variant LBFGS to the optimization of nonsmooth convex problems. This is done in a rigorous fashion by generalizing three components of BFGS to subdifferentials: the local quadratic model, the identification of a descent direction, and the Wolfe line search conditions. We prove that under some technical conditions, the resulting subBFGS algorithm is globally convergent in objective function value. We apply the limitedmemory variant of subBFGS (subLBFGS) to L2-regularized risk minimization with the binary hinge loss. To extend our algorithms to the multiclass and multilabel settings, we develop a new, efficient, exact line search algorithm. We prove its worst-case time complexity bounds, and show that it can also extend a recently developed bundle method to the multiclass and multilabel settings. Moreover, we apply the directionfinding component of our algorithms to L1-regularized risk minimization with the logistic loss. In all these contexts our methods perform comparable to or better than specialized state-of-the-art solvers on a number of publicly available datasets. This thesis also provides stochastic variants of the BFGS method, in both full and memory-limited forms, for large-scale optimization of convex problems where objective and gradient must be estimated from subsamples of training data. The limited-memory variant of the resulting online BFGS algorithm performs comparable to a well-tuned natural gradient descent but is scalable to very high-dimensional problems. On standard benchmarks in natural language processing it asymptotically outperforms previous stochastic gradient methods for parameter estimation in Conditional Random Fields.
منابع مشابه
Quasi-Newton Methods: A New Direction
Four decades after their invention, quasiNewton methods are still state of the art in unconstrained numerical optimization. Although not usually interpreted thus, these are learning algorithms that fit a local quadratic approximation to the objective function. We show that many, including the most popular, quasi-Newton methods can be interpreted as approximations of Bayesian linear regression u...
متن کاملProjected Newton-type Methods in Machine Learning
We consider projected Newton-type methods for solving large-scale optimization problems arising in machine learning and related fields. We first introduce an algorithmic framework for projected Newton-type methods by reviewing a canonical projected (quasi-)Newton method. This method, while conceptually pleasing, has a high computation cost per iteration. Thus, we discuss two variants that are m...
متن کاملA quasi-Newton proximal splitting method
A new result in convex analysis on the calculation of proximity operators in certain scaled norms is derived. We describe efficient implementations of the proximity calculation for a useful class of functions; the implementations exploit the piece-wise linear nature of the dual problem. The second part of the paper applies the previous result to acceleration of convex minimization problems, and...
متن کاملA Quasi-Newton Approach to Nonsmooth Convex Optimization Problems in Machine Learning
We extend the well-known BFGS quasi-Newton method and its memory-limited variant LBFGS to the optimization of nonsmooth convex objectives. This is done in a rigorous fashion by generalizing three components of BFGS to subdifferentials: the local quadratic model, the identification of a descent direction, and the Wolfe line search conditions. We prove that under some technical conditions, the re...
متن کاملQuasi-Newton Methods for Nonconvex Constrained Multiobjective Optimization
Here, a quasi-Newton algorithm for constrained multiobjective optimization is proposed. Under suitable assumptions, global convergence of the algorithm is established.
متن کامل